The George Washington University System for the Code-Switching Workshop Shared Task 2016

نویسندگان

  • Mohamed Al-Badrashiny
  • Mona Diab
چکیده

We describe our work in the EMNLP 2016 second code-switching shared task; a generic language independent framework for linguistic code switch point detection (LCSPD). The system uses characters level 5-grams and word level unigram language models to train a conditional random fields (CRF) model for classifying input words into various languages. We participated in the Modern Standard Arabic (MSA)-dialectal Arabic (DA) and SpanishEnglish tracks, obtaining a weighted average F-scores of 0.83 and 0.91 on MSA-DA and EN-SP respectively.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Interrelation of Preventive Care Benefits and Shared Costs under the Affordable Care Act (ACA)

With the implementation of the Affordable Care Act (ACA), access to insurance and coverage of preventive care services has been expanded. By removing the barrier of shared costs for preventive care, it is expected that an increase in utilization of preventive care services will reduce the cost of chronic diseases. Early detection and treatment is anticipated to be less costly than treatment at ...

متن کامل

The Tel Aviv University System for the Code-Switching Workshop Shared Task

We describe our entry in the EMNLP 2014 code-switching shared task. Our system is based on a sequential classifier, trained on the shared training set using various characterand word-level features, some calculated using a large monolingual corpora. We participated in the Twitter-genre Spanish-English track, obtaining an accuracy of 0.868 when measured on the tweet level and 0.858 on the word l...

متن کامل

The Howard University System Submission for the Shared Task in Language Identification in Spanish-English Codeswitching

This paper describes the Howard University system for the language identification shared task of the Second Workshop on Computational Approaches to Code Switching. Our system is based on prior work on SwahiliEnglish token-level language identification. Our system primarily uses character n-gram, prefix and suffix features, letter case and special character features along with previously existin...

متن کامل

Columbia-Jadavpur submission for EMNLP 2016 Code-Switching Workshop Shared Task: System description

We describe our present system for language identification as a part of the EMNLP 2016 Shared Task. We were provided with the Spanish-English corpus composed of tweets. We have employed a predictor-corrector algorithm to accomplish the goals of this shared task and analyzed the results obtained.

متن کامل

Presurgical Language Mapping in Patients With Intractable Epilepsy: A Review Study

Introduction: about 20% to 30% of patients with epilepsy are diagnosed with drug-resistant epilepsy and one third of these are candidates for epilepsy surgery. Surgical resection of the epileptogenic tissue is a well-established method for treating patients with intractable focal epilepsy. Determining language laterality and locality is an important part of a comprehensive epilepsy program befo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016